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Abstract 

In this paper we take a new look at numerical techniques for solving parabolic equations by the method of lines. 
The main motivation for the proposed approach is the possibility of exploiting a high degree of parallelism 
in a simple manner. The basic idea of the method is to approximate the action of the evolution operator on a 
given state vector by means of a projection process onto a Krylov subspace. Thus, the resulting approximation 
c o ns i sts of applying an evolution operator of very small dimension to a known vector which is, in turn, 
computed accurately by exploiting well-known rational approximations to the exponential. Because the rational 
approximation is only applied to a small matrix, the only operations required with the original large matrix are 
matrix-by- vector multiplications, and as a result the algorithm can easily be parallelized and vectorized. Some 
relevant approximation and stability issues are discussed. We present some numerical experiments with the 
method and compare its performance with a few explicit and implicit algorithms. 
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1 Introduction 

In recent years there has been a resurgence of interest in explicit methods for solving parabolic partial 
differential equations, motivated mainly by the desire to exploit the parallel and vector processing capabilities 
of new supercomputer architectures. The main attraction of explicit methods is their simplicity, since the 
basic operations involved in them are matnx-by-vector multiplications, which, in general, are rather easy to 
parallelize and vectorize. On the other hand, the stringent constraint on the size of the time steps required 
to ensure stability reduces efficiency to such an extent that the use of implicit methods becomes almost 
mandatory when integrating on long time intervals. Implicit methods do not suffer from stability related 
restrictions but have another disadvantage: they require the solution of linear systems that are often large and 
sparse. For this reason implicit methods tend to be far more difficult to implement on parallel machines than 
their explicit counterparts, which only require matrix-by-vector multiplications. Thus, the trade-off between 
the two approaches seems to be a large number of matrix-by-vector multiplications on the one hand, versus 
linear systems to solve on the other. For two-dimensional and, more important, for three-dimensional 
problems, methods of an explicit type might be attractive if implemented with care. 

We next observe that in spite of the above conventional wisdom, the distinction between the explicit 
and implicit approaches is not always clear. Consider the simple system of ordinary differential equations 
y' = A y + /. We first point out that if our only desire is to use exclusively matrix-by-vector products 
as operations with the matrix A, for example, for the purpose of exploiting modem architectures, then 
certainly standard explicit methods do not constitute the only possibility. For example, one may use an 
implicit scheme and solve the linear systems approximately by some iterative method, such as the conjugate 
gradient method, with no preconditioning or with diagonal preconditioning. When the accuracy required for 
each linear system is very low, then the method will be akin to an explicit scheme, although one of rather 
unusual type since its coefficients will vary at every step. As the accuracy required for the approximations 
increases, the method will start moving towards the family of purely implicit methods. Therefore, if we 
were to call “explicit” any scheme that requires only matrix-vector products, then the borderline between 
the two approaches is not so well-defined. 

Wfe would like to take advantage of this observation to develop schemes that are intermediate between 
explicit and implicit. In this paper we will use the term polynomial approximation method for any scheme 
for which the only operations required with the matrix A are matrix-by-vector multiplications. The number 
of such operations may vary from one step to another and may be large. 

To derive such intermediate methods we explore systematically the ways in which an approximation 
to the local behavior of the ordinary differential equations can be obtained by using polynomials in the 
operator A. Going back to the comparison sketched above, we note that the process involved in one single 
step of an implicit method is often simply an attempt to generate some approximation to the operation 
exp(£t A)v via a rational approximation to the evolution operator [49]. If a CG-like method is used to 
solve the linear systems arising in the implicit procedure, the result will be a polynomial scheme. Thus, 
there are two phases of approximation: the first is obtaining a rational or polynomial approximation to the 
exponential, and the second is solving the linear systems by some iterative method. We would like to reduce 
these two phases to only one by attempting to directly approximate exp(£t A)v. The basic idea is to project 
the exponential of the large matrix A into a small Krylov subspace. 

To make the discussion more specific and introduce some notation, we consider the following linear 
parabolic partial differential equation: 

= —Lu(x,t) + r(x), (1) 

at 

u(0,x) = «o, x € ft, 
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u(t,x) = x G dtl,t > 0, 

where —L is a second order partial differential operator of the elliptic type, acting on functions defined 
on the open, bounded and connected set fl. Using a method of lines (MOL) approach, equation (1) is first 
discretized with respect to space variables, resulting in the system of ordinary differential equations 

-jp- = -Aw(t) + r, (2) 

u>(0) = wq. 

For the remainder of our discussion, we will assume A to be time-independent. In this situation the solution 
is explicitly given by 


w(t) = A 1 r + e tA (wo~A ! r) (3) 

If we let w(t) = w(t) - A~ l r, and accordingly, wq = xdq- A~ x t , then (3) is equivalent to the following 
expression: 


w(t) = e~ tA WQ. 

Note that when r = 0, w(t ) is the same as ti>(£). An ideal one-step method would consist of a scheme of the 
form 


w(t + S) = e SA w(t ) 


(4) 


in which 6 constitutes the time step. 

The basic operation in the above formula is the computation of the exponential of a given matrix 
times a vector. If we were able to perform this basic operation with high accuracy, we would have what 
is sometimes called a nonlinear one-step method [24], because it involves a nonlinear operation with the 
matrix A. We should stress that there is no need to actually evaluate the matrix exponential exp (— 6 A), but 
only its product with a given vector. This brings to mind an analogous situation for linear systems in which 
it is preferable to solve Ax = b than to compute A~ l and then multiply the solution by b. 

We point out that we follow an approach common in the literature [2, 39], putting the emphasis on 
the semi -discrete problem (2). As a result, our discussion of stability is purely from an Ordinary Differential 
Equation point of view and is not concerned with the effect of space discretization errors and convergence. 
We establish conditions under which our methods, applied to the stiff system of ODEs (2), satisfy certain 
criteria of stability which, in turn, is an important step toward any investigations of convergence. (See also 
[4. 37].) 

The Krylov subspace method presented here was introduced in [12] for general nonsymmetric ma- 
trices. However, similar ideas have been used previously in various ways in different applications for 
symmetric or skew-symmetric matrices. For example, we would like to mention the use of this basic idea 
in Park and Light [31] following the work by Nauts and Wyatt [27]. The idea of exploiting the Lanczos 
algorithm to evaluate terms of the exponential of Hamiltonian operators seems to have been first used in 
chemical physics by Nauts and Wyatt in the context of the Recuisive-Residue-Generation method [26]. More 
recently, Friesner et al. [8] have demonstrated that these techniques can be extended to solving nonlinear 
stiff differential equations. The approach developed in this paper is related to the work of Nour-Omid [29], 
in which systems of ODEs are solved by first projecting into Krylov subspaces and then solving reduced 
tridiagonal systems of ODEs; the approach of Tal-Ezcr and Kosloff [43]; and also the work of Tal-Ezer 
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[42] and Schaefer [38] on polynomial methods based on Chebyshev expansions. The idea of evaluating 
arbitrary functions of a Heimitian matrix with the use of the Lanczos algorithm has also been mentioned by 
van der Vorst [48]. The use of preconditioning for extending the stability interval of explicit methods, thus 
bringing them closer to fully implicit methods, has been discussed in [47, 34]. Although our method works 
from a subspace, it does not suffer from some of the aspects of partitioning methods (see, for example, 
[52]). Partitioning methods rely on explicitly separating and treating differently the stiff and nonstiff parts. 
However, it is usually impractical to confine stiffness to a subsystem [3]. The Krylov method, on the other 
hand, relies on the nice convergence property of Krylov approximations to essentially reach a similar goal in 
an implicit manner [36], The outermost eigenvalues, including the largest ones, will be well approximated 
by the Krylov subspace, so that the Krylov approximation to the matrix exponential will be accurate in 
those eigenvalues, thus accommodating stiffness. We also note that there have been several recent efforts to 
design agorithms for the solution of time-dependent problems, some of which may be particularly suited to 
parallel processing; see [45, 17, 19, 20, 40] and [46] for a review. 

The structure of our paper is as follows. In Section 2 we formulate the Krylov subspace approximation 
algorithm and prove some a priori error bounds. In Section 3 we present a method for the accurate 
approximation of the exponential of the Hessenbetg matrix produced in the course of the Amoldi or 
Lanczos algorithm. In Section 4 we consider problems with time-dependent forcing and introduce two 
approaches to handle the integration of the non-homogeneous term. We then proceed in Section 5 with a 
stability analysis of each approach in the context of the quadrature techniques used, leading to Theorem 5.1. 
In Section 6 we present numerical experiments for problems of varying difficulty, and finally, in Section 7 , 
our concluding remarks. 


2 Polynomial approximation and the use of Krylov subspaces 

In this section we consider using polynomial approximation to (4), that is, we seek an approximation of the 
form: 

e~ A v & p m -\(A)v, (5) 

where p m _ i is a polynomial of degree m - 1. There are several ways in which polynomial approximations 
ran be found. The simplest technique is to attempt to minimize some norm of the error e 2 — p m -\( z ) on a 
continuum in the complex plane that encloses the spectrum of A. For example, Chebyshev approximation 
can be used, but one disadvantage is that it requires some approximation to the spectrum of A. In this 
paper we consider only approaches that do not require any information on the spectrum of A. This will be 
considered in Section 2. 1 . A theoretical analysis will then follow in Section 2.2. 


2.1 The Krylov subspace approximation 

The approximation (5) to e~ A v is to be taken from the Krylov subspace 

K m = span{v, Av, A ro-1 v). 

Inoiderto manipulate vectors in if ro , it is convenient to generate anorthonoimal basis V m - ■ ■•!%]■ 

We will take as initial vector vi — u/ 1 j v j [2 and generate the basis with the well-known Arnold 1 algorithm , 

described below. 
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Algorithm: Arnoldi 

1. Initialize: 

Compute vi := v/||t>|| 2 . 

2. Iterate: Do j = 1 , 2, m 

1. Compute w := Avj 

2. Do * = 1,2, . . ,,j 

(a) Compute h{j := (w, r,) 

(b) Compute w := w - hijVi 

3. Compute hj+ ij = |M |2 and Vj+i = w/hj+ij. 

By construction the above algorithm produces an orthonormal basis V m — [t>i , ifc, . . . , u m ], of the 
Krylov subspace K m . If we denote the m x m upper Hessenbeig matrix consisting of the coefficients h t j 
computed from the algorithm by H m , we have the relation 

— Vm Hm "t* 1 Vm+1 ^m * (^) 

For the remainder of this discussion, for any given k, e* will denote the k A unit vector belonging to R m . 
From the orthogonality of the columns of V m we get that H m = V%AV m . Therefore H m represents the 
projection of the linear transformation A to the subspace K m , with respect to the basis V m . 

Since V m is orthonormal, the vector x opt = V m V£e~ A v is the projection of e~ A v on K m , that is, 
it is the closest approximation to exp(-A)r from K m . Since for /? = ||v|| 2 , we can write v = (3v j and 
i7j = V m e\, it follows that: 

V m v£e~ A v = PV m VZe~ A v u 

= pV m VZe- A V m e v 

We can thus write the optimal solution as x^t = V m yopt where yopt = pv£e~ A V m e\ . Unfortunately, y op t 
is not practically computable, since it still involves e~ A . We can approximate V^e~ A V m by e~ Hm , leading 
to the approximation y opt ~ (3e~ Hm e \ and 

e~ A v « (3V m e~ Hm e\. (7) 

From the practical point of view there remains the issue of efficiently computing the vector e~ Hm e\ 
which we address in Section 3. 

The approximation (7) is central to our method, and its effectiveness is discussed throughout the 
remainder of the paper. The next section is devoted to providing the theoretical justification. 

We also note that when A is symmetric, Amoldi’s algorithm simplifies into the Lanczos process, 
which entails a three-term recurrence. This is a result of the fact that the matrix H m = V^AV m must be 
symmetric and therefore tridiagonal symmetric, and so all J = 0 for i = 1,2, .., j - 2. However, the 
resulting vectors, which are in theory orthogonal to each other, tend to lose their orthogonality rapidly. 

2J2 A priori error bounds and general theory 

The next question that arises concerns the quality of the Krylov subspace approximation defined in Section 
2.1. A first observation is that the above approximation is exact for m = n, because in this situation 
t>m+i = o and (6) becomes AV m = V m H m , where V m is an n x n orthogonal matrix. In fact, similarly to 
the conjugate gradient method and the Amoldi process, the approximation will be exact for m whenever m 
is larger or equal to the degree of the minimal polynomial of »i with respect to A. As for these algorithms, 
we need to investigate what happens when m is much smaller than this degree. 
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In the sequel we will need to use the concept of the logarithmic norm of a matrix. Let B be a given 
matrix. The logarithmic norm p{.) is defined by: 


KB) 


lim 
h- o+ 


P + fc*l|-l 

h 


Note that p is associated with a particular norm. Unless it is otherwise specified, we assume that the reference 
norm is the usual 2-nonn. Then the logarithmic norm p(B) is equal to the maximum eigenvalue of the 
symmetric part of B, that is, T 

K B ) = -W(— -y— ■)• 

The function p satisfies many norm-like properties, but it can also take negative values. We refer to 
[5, 4] for a description of its properties. It can be shown in particular that 

||e B *|| < eK B K < 8 > 


We assume throughout that A is a real matrix. We now state the main theorem of this section. 


Theorem 2.1 Let A be any matrix and let p = ||A|| 2 , P = IMh and r/ = p(-A). Then the error of the 
approximation (7) is such that 

\\e~ A v - pV m e~ Hm ex\\i < 2 pp m <Kv) < 2^max(l,e”) (9) 


where 



The proof of the theorem is established in Appendix B. 

To see what one can gain in using the logarithmic norm instead of a standard spectral norm, compare 
Theorem 2.1 with the bound proposed earlier in [12]: 

||e- x » - fJV m e~ Hm ei Ik < ■ < 10 > 

For the sake of illustration let 

( 0.5000 -0.0938 0.0000 \ 

-0.4063 0.5000 -0.0938 

0.0000 -0.4063 03000 / 

and let A = I ® B + B ® I, where 0 is the symbol for the Kronecker product, and I the identity 

matrix Such an A arises from the discretization of tin + u w + f(u x + «v). when C = 10 -°- 111 11131 
case a(-A) = -0.2929, whereas ||A|| 2 = 1.7235. When m = 7, the bound for the remainder* obtained 
ftom Theorem 2.1 is 0.009, whereas the estimate from (10) is 0.0502, and the actual remainder norm is 
|| r7 (_ J 4 )j| 2 = 0.0066. Hence the use of the logarithmic norm results in an overshoot factor of only 1.3, in 
comparison to 7.5 when using the spectral norm. In general, the advantage of using the logarithmic norm 


1 . Note that for simplicity we are here concerned only with the remainder of the Taylor series for e A , and hence only with the 
||r m (4)||2 part of the bound. 
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follows from the inequality || A|| > p(-A) which is among the properties of p{.) (cf. [5]). One can construct 
examples however, for which the bounds from using p(-A) are as loose as those obtained from using \\A\\ 
(cf. [4]). Also, asymptotically the rates of convergence as estimated by the bounds (10) and (9) arc both of 
the form p/(m!) ,/,m . The following corollary follows trivially from Theorem 2.1. 

Corollary 2.1 If the eigenvalues of the symmetric part of the matrix A arc non-negative, then: 

\\e' A v - <2pQ. 


Hence the bound of Corollary 2.1 holds for many important classes of matrices including positive definite 
matrices and normal matrices with eigenvalues in the positive half-plane. 

We note that when A is normal, the bound of Corollary 2. 1 can be derived without invoking logarithmic 
norms because we can write A - QAQ H where A is the diagonal matrix of eigenvalues of A and Q is 
unitary. Then 

IM'Olk = II E = II E 

k=m * k=m 

From the assumption on A, 9?e [A] > 0. Applying component-wise a result of E. Landau [22], (cf. [33, p. 
35, problem 151]), the remainder can be bounded by its first term: 


IMA)|| 2 < 


i|-A™l| 2 _ l|A m l| 2 

ml ml 


The bound of Corollary 2.1 follows after using a similar treatment for H m and combining the results. 

When we know that A is symmetric positive definite, an even better bound can be obtained by 
applying the previous theory to A - (I, where C > VinCA) (the minimum eigenvalue of A). We refer to 
[11] for the proof. 


Theorem 22 Let A be a symmetric positive definite matrix and let p = ||A|| 2 and P - ||v|| 2 . Then the 
error of the approximation (7) is such that 

lle'^v- pV m e~ Hm ei \\2 < (11) 


These theorems show convergence of the approximation (7). They can also serve as a guide to 
choosing the step size in a time-stepping procedure. Indeed, if we were to replace A by the scaled matrix 
t A, then the Krylov subspace will remain the same; that is, V m will not change, and Hm will be scaled to 
T H m . As a result, for arbitrary r one can use the approximation 

e- TA vstpV m e~ THm eu ( 12 ) 

and the bound (9) becomes 

\\e~ rA v - pV m e~ Tlim e x || 2 < 2 /?(r/>) m #/i(-rA)). (13) 

The consequence of (13) is that by reducing the step-size one can always make the scheme accurate enough, 
without changing the dimension m. We note that these bounds are most useful when m is much larger than 
what is usually used by standard explicit methods. Indeed, in our experiments, we have used large values of 
m to our advantage. We refer the reader to [35] for additional results on error bounds for this method. 
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3 Practical computation of exp(— H m )e\ 

Wc now address the problem of evaluating y = e~ H e u where H is the Hessenberg or tridiagonal matrix 
produced by Amoldi’s method or the Lanczos method. We drop the subscript m for convenience. Although 
H is a small matrix, the cost of computing y can easily become non-negligible. For example, when H is 
tridiagonal symmetric, the simplest technique for computing y is based on the QR algorithm. However, this 
is rather expensive. We would thus like to use approximations which have high accuracy, possess desirable 
stability properties, and allow fast evaluation. The method we recommend is to use rational approximation 
to the exponential, evaluated by partial fraction expansion. This technique has been discussed in the context 
of implicit methods by the authors [12, 9], and we would like to take advantage of it in the present context 
The (serial) complexity of the QR algorithm is 0(m 3 ) for Hessenberg matrices and 0{m ) for tridiagonal 
matrices, compared with a cost of 0(m 2 ) for Hessenberg and O(m) for tridiagonal matnees when the 
rational approximation method is used. In addition, parallelism can be exploited in this approach. 

The rational approximation to the exponential has the form 




9*1 (*)’ 


(14) 


where and are polynomials of degrees v\ and respectively. 

An approximation of this type, referred to as a Pad d approximation, is determined by matching 
the Taylor series expansion of the left-hand-side and right-hand-side of (14) at the origin. Since Padd 
approximations are local, they are very accurate near the origin but may be inaccurate far away from it. 
Other schemes have been developed [49, 2, 16] to overcome this difficulty. For typical parabolic problems 
that involve a second order partial differential operator -L that is self-adjoint elliptic, the eigenvalues of 
L are located in the interval [0, +oo), and it is therefore natural to seek the Chebyshev (uniform) rational 
approximation to the function e~ z which minimizes the maximum error on the interval [0, +oo). To unify 
the Padd and uniform approximation approaches, we restrict ourselves to “diagonal” approximations of the 
form (i>, v), that is, in which the numerator has the same degree i/ as the denominator. We note however that 
alternative strategies (e.g., {v - 1 , v)) will frequently work better for Padd approximations, without altenng 
the principle of the method. Note that the stability properties of the aforementioned rational approximations 

are discussed extensively in the literature [51, 6, 18]. 

A comparison between the Padd approximation and the Oiebyshev rational approximation reveals the 
vast superiority of the latter in the context of the Krylov-based methods presented in this paper, at least for 
symmetric positive matrices H, and relatively large values of m. To see why this is so, we note that the idea 
of the method presented in this paper is to allow the use of large time steps by utilizing Krylov subspaces of 
relatively high order. However, for our method to be successful, the ability to use a large time step 6 must also 
cany over to the computation of exp(- H6)e x . We mentioned earlier that the Padd approximations provide 
good approximation only near the origin. Using the Chebyshev rational approximation to the function e 
over the interval [0, +oo) [1, 49], it becomes possible to utilize time steps as large as our Krylov-based 

method allows. . 

For example in the diagonal Chebyshev rational approximation, the infinity norm of the error over the 

interval [0, +oo) is of the order of 10 -, ° as soon as v reaches 10. For each additional degree the improvement 
is of the order of 9.289025... [1]. What this means is that for all practical purposes e * can be replaced by a 
rational function of relatively small degree. When H is nonsymmetric and its eigenvalues are complex, then 
the rational function is no longer guaranteed to be an accurate approximation to the exponential. Although a 
rigorous analysis is lacking, we experimentally verified that for the examples we treated, die approximation 
still remained remarkably accurate when the eigenvalues were near the positive real axis. Although little 
is known concerning rational uniform approximation in general regions of the complex plane, a promising 
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alternative is to use asymptotically optimal methods based on Faber transformations in the complex plane 
[7]. We also point out that there exist other techniques for approximating matrix exponentials by rational 
functions of A\ see, for example, [28, 16]. The restricted Padd approximations of [28] avoid complex 
arithmetic at the price of a reduced order of approximation, and reduced levels of parallelism caused by the 
occurrence of multiple poles. 

For compactness of notation in the diagonal approximations we will write simply R v from now on 
for the (v, v) rational approximation to e~ z . Then, in order to evaluate the corresponding approximation to 
e~ H e \ , we need to evaluate the vector y, where 

y = = q„(H)- l p„(H)e ,. (15) 

It has been proposed in several contexts that an efficient method for computing some rational matrix 
functions is to resort to their partial fraction expansions [9, 12, 10, 21, 23, 30, 41]. The approach is possible 
since it can be proved analytically that the diagonal Pad6 approximation to e z has distinct poles [53]. 
Explicit calculations indicate that this seems to be also true for the uni form approximation. In particular we 
write 


R„(z) = ao + S— “T 

Z “ A, 


where 


_ i _ Pv{ X i) * — . 1 O ,, 

fto — — ? and Oi = . ttt t — i , z, . . . , v 

* 1 / 


in which , k v arc the leading coefficients of the polynomials p„ and q„ respectively. 

With this expansion the algorithm for computing (15) becomes: 

Algorithm: 

1. Fort = l,2,...,^solve(^ - A ,J)y, = ei. 

2. Compute y = aoei + a. yj. 

The motivation in [9] forusing the above scheme was parallelism. The first step in the above algorithm 
is entirely parallel since the linear systems (H — A j J)y, = e\ can be solved independently from one another. 
The partial solutions are then combined in the second step. The matrices arising in [9] are large and sparse, 
unlike those of the present situation. However, parallel implementation of the above algorithm can be 
beneficial for small Hessenberg matrices as well. For example, in a parallel implementation of the Krylov 
scheme, the “Amdahl effect” may cause severe reduction in efficiency unless all stages of the computation 
were sufficiently parallelized. 

We should also point out that even on a scalar machine, the above algorithm represents the best way 
of computing y. It requires fewer operations than a straightforward use of the expression (15). It is also far 
simpler to implement. The poles A; and partial fraction coefficients a,- of R„(z) are computed once and for 
all and coded in a subroutine or tabulated, these are shown in Appendix A for v = 10 and v = 14 for the 
case of Chebyshev rational approximation. 


4 The case of a time-dependent forcing term 

In the previous sections we made the restrictive assumption that the function r in the right-hand side is 
constant with respect to time. In this section we address the more general case where t is time dependent. 
In other words, we now consider the system of ODEs of the form 
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-Aw(t) + r(f). 


(16) 


dw(t) _ 

dt ~ ' # ’ 

As is well known, the solution of this system is of the form 

w(t) = e~ tA wo + J e^~ t)A r(s)ds. 


Proceeding as in Section 1, we now express u>(f + 6) as 

w(t + 6) = e~ SA (w(t) + e~^ A r(s)dsJ 


= e * A w 


= € SA W 


ft 1-0 

(t) + J e-V +t -) A r(s)ds 
(t) + J* + T ) dT . 


(17) 


In one way or another, the use of the above expression as the basis for a time-stepping procedure will require 
numerical integration. Note, however, that under the assumption that we can evaluate functions of the form 
c~ Aa v accurately, we have transformed the initial problem into that of evaluating integrals. Simple though 
this statement may seem, it means that the concerns about stability disappear as soon as we consider that 
we arc using accurate approximations to the exponential. The reason for this is that the variable w does not 
appear in the integrand. The issue of stability will be examined in detail in Section 5. 

The next question we would like to address is how to evaluate the integral in (17); for this we consider 
two distinct approaches. 


4.1 The first approach 

To begin with, consider a general quadrature formula of the form. 



where the tj ’s are the quadrature nodes in the interval [0, f]. One of the simplest rules is the trapezoidal rule 
on the whole interval [0, £] which leads to 

io(t + £) = e _ 6 j 4 u?(t) + ^e - 5 y 4 r(t)+ -r(t + rf) 

= e- SA [w(t)+^r(t)] + lr(t + 6). 

The above formula is attractive because it requires only one exponential evaluation. On the other 
hand it may be too inaccurate to be of any practical value since it means that we may have to reduce the step 
size 6 drastically in order to get a good approximation to the integrals. The next alternative is to use a higher 
order formula, that is, a larger p in (18). For example we tried a Simpson formula instead of trapezoidal 
rule. The improvements are noticeable, but we have to pay the price of an additional exponential evaluation 
at the mid-point t + 6/2. 
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The recommended alternative is based again on a judicious exploitation of Krylov subspaces. In the 
formula (18) we note that each term e-( 6 - T ’ )A r(t + r,) need not be evaluated exactly. Observe that in the 
ideal situation where r(s ) is constant, equal to r, in the interval [t, t + £], then formula (12) shows that we 
can evaluate e~^~ r ^ A r for all t from the Krylov subspace generated for t = 0 (for example), via 

e -(S-r)A r ' w pv m e-( s - T)Hm e i, 

where V m and H m correspond to the Krylov subspace K m (A, r). In the more general case where r varies 
in the interval [f , t + £] we can use a projection formula of the form 

e -(t-r )A r ( t + T ) ~ V m e- (S - T)Hm V£r(t + t). (19) 

The combination of the quadrature formula (18) and formula (19) has been tested and was found 
to be remarkably accurate. Our experiment in Section 6.4 shows an example of a rapidly varying forcing 
term r(f ), where the method can perform adequately with only one additional exponential evaluation (at the 
midpoint). We note, however, that for some highly oscillatory forcing terms, a (preferably adaptive) scheme 
involving additional exponential evaluations or a reduction of the time step S may be needed. 


4.2 The second approach 

In the above approach we need to compute two Krylov subspaces: one associated with the current iterate 
w (t) and the other associated with r(t). We would like to show that we can reduce the computation to only 
one Krylov subspace. The resulting algorithm has different numerical properties from the one presented in 
Section 4.1. 

The main idea is to use the identity: 

e~ SA = 1- A I* e~ ,A ds = I - A [* e-^~ T)A dr, 

Jo Jo 

which is obtained readily by integration. This is then substituted in the first term of the right-hand side of 
the equation 

w(t + S) = e~ SA w{t) + f c -( 4 - T M r (t + r)dr (20) 

JO 

to obtain 

w(t + 6) = w(t) + f e -(«-*)4[ r (f + r )_ Aw(t)]dr. (21) 

Jo 

Note the important fact that the term e~ SA w(t), which was in the previously used formula (20), 
has been removed at the slight expense of modifying the function r(< + r) in the interval t € (0, 6). 
The modification consists of subtracting a vector that is constant in the interval of integration. In terms 
of computations, this modification requires one matrix-by-vector multiplication, certainly an inexpensive 
overhead, compared with that of applying the propagation operator to a vector. There is one fundamental 
difference between the scheme (20) used in the first approach and the scheme (21) of the second approach: 
the unknown function w now figures in the integrand. This may mean completely different numerical 
properties and, as is shown in Section 5, the loss of the unconditional stability. 
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5 Stability 

In this section we investigate the linear stability of the Krylov time-stepping methods when used for the 
solution of the semi-discrete system (2). As noted in the introduction, this discussion will not take into 

account the interaction between space and time discretizations. 

We first consider the stability properties of the approximate evolution operator in (7). If A is positive 
real 2 i.e., if its symmetric part 5 = |(A + A T ) is positive definite, then, so is the matrix H m [32]. Moreover 
the eigenvalues of A and H m have positive real parts and the smallest eigenvalue X m m(S) of the symmetric 
part of A is a lower bound for the eigenvalues of S m = j (H m + because S m = V m SV m . In 

terms of logarithmic norms, n(-H m ) < n(-A) < 0; see also Lemma B.3 in Appendix A. Since V m has 
orthonormal columns, the approximate evolution operator satisfies 

\\V m e- H "% < \\e- Hm %. 

As a result, we can state that in the case where exp (~H m 6) is evaluated exactly then 

|| V m e -*" 5 ||2 < \\ e - Hm % 

< e l*{-H m 6) < e n(-A6) 

< I- 


If, on the other hand, exp(-5 m <5) is not computed exactly, then stability will depend upon the method 
of evalusmon used. In particular, let exp(-5 m <5)be evaluated using a diagonal (v, v) Pad6 approximation R v . 
Diagonal Pad6 approximations are A-acceptable. From above, $tx H H m x > 0 for any x and < 0. 

We then obtain 

\\V m R v (H m S)\\ 2 < \\R v (H m S)\\ 2 < 1 


from a result of von Neumann which states that, when the field of values of a matrix B is contained 
in 7i, the nonnegative half of the complex plane, and if a rational function / maps 7i in the unit disk, 
then *11/(5) ||2 < 1; see also [44] and [13, Theorem 4]. A similar conclusion holds for the subdiagonal 
( 1 / - 1 , 1 /) approximation. When a diagonal Chebyshev approximation is used then this no longer holds as 
these approximations may amplify small eigenvalues of H m near zero. We can only state that for symmetric 
positive definite matrices and large enough values of v, and ^A min (5 m ) bounded away from zero then 


\\V m R v {H m 6)\\ 2 < 1. 

For the remainder of this section it will be 
computed exactly. 


assumed that the exponential terms exp(— H m 6) are 


5.1 Stability behavior of the first approach 
Consider a general solution scheme for (16) of the form 

U>n+1 = e ~' ASw n + (22) 

where s„ is some approximation to the integral (18). The above scheme is a one-step technique where 6 
is the time step. If we assume that the exponential term is exactly evaluated, then the above methods are 
referred to as the nonlinear multistep methods by Lee [24]. It was remarked in [25] that these methods are 
stable. More generally, let us assume that the error incurred in the evaluation of the term e w n in (22) is 
d „, while the error in the evaluation of the integral term s n is e 2 ,n- Then the recurrence (22) is replaced by 

w n+ i = e~ AS w n + ei >n + a* + e 2 ,„ (23) 


2. A matrix B is positive real if x T Bx > 0 for any real vector i ?^0 [50]. 
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in which s* is the exact integral (18). In this situation, the total error at each step is of the form 

e n+ i = w(t n+ i) — u>n+i = c~ AS e n + ei >n + ei <n . (24) 

This shows that if A has no eigenvalues in the negative half-plane of the complex domain, then the above 
procedure is stable. Note that this is independent of the procedure used to compute the approximation to 
the matrix exponential by vector product. If one uses the Krylov approximation to the exponential, then we 
essentially have an explicit procedure that is stable. Although this may seem like a contradiction, notice that 
we have made very special assumptions. The important point is that we arc essentially considering accurate 
one-step methods. In the extreme case where r is constant, the solution can be evaluated in just one step at 
any point in time provided the exponential is accurately approximated. 


5 J2 Stability behavior of the second approach 

As mentioned earlier the alternative approach described in Section 4.2 is attractive from the point of view 
of efficiency but may have poor numerical properties. We will outline in this section a stability analysis of 
this class of methods in an effort to determine how to select the quadrature formulas that are most likely to 
lead to robust procedures. 

We consider the simple time stepping scheme derived by applying a quadrature formula to the equation 

( 21 ) 

Wn+i = w n + 6 2 Pie~ {6 ~ Ti)A [r(t + r;) - Aw n ] (25) 

«=l 

where r, , i — 1 , k are the quadrature nodes in the interval [0, £] and m their corresponding weights. Once 
more, we assume that the exponential term in (25) is exactly calculated. The above equation can be recast 
in the form: 

w n+ i = - 6 Pie~ {6 ~ Ti)A Aj l»n + 9n 


in which g n is a term that does not contain the variable w n . The stability of the above recurrence is easily 
studied by replacing the matrix A by a generic eigenvalue A. This leads to the scalar recurrence: 


w n +i 





W„ + 9n- 


We need to determine under which conditions the modulus of the evolution operator 


k 

o( A) = 1 -£5> f e-<*- T ‘> A A 

t=i 


does not exceed one. 

Before proceeding with the more complicated general analysis, we first consider in detail two basic 
quadrature formulas: the trapezoidal rule and the mid-point rule. For the trapezoidal rule we have 

a( X)=l- 6 -[Xe- xs + X]. 

We restrict ourselves to the case where A is real and positive. We need to have 

-1 <a(A) = l-y[e- w + l]<l 
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( 26 ) 


or, since the second inequality is trivially satisfied, 

6\{e~ xs + 1] < 4. 

Since e~ xs < 1, a sufficient condition for the above inequality to hold is that 

6X <2, 

which is just as restrictive as an ordinary explicit method. Note that a necessary condition for (26) to be true 
is that 6 X < 4, 

For the mid-point rule, we have 

o(A) = 1 - 6Xe~ xs/2 . 

Considering again real and positive A, we will seek conditions under which we have 

- l < fl (A) = 1 - 6Xe~ xs ' 2 < 1. (27) 

The second inequality is always satisfied and from the first we get the condition 

X6e~ xs/2 < 2. 

As is easily seen through differentiation, the maximum with respect to A 6 of the left-hand side is reached 
for A 6 = 2, and its value is 2e -1 which is less than 2. Therefore, inequality (27) is unconditionally satisfied. 
This fund am ental difference between the trapezoidal rule and the mid-point rule underscores the change of 
behavior in the second approach depending on the quadrature rule used. We will extend this analysis shortly. 

The above development for the mid-point rule was restricted to A being on the positive real line. Let 
us consider this case in more detail for A complex. Setting u = A S = a — i0, we have 

a(A) = l-ue-“ /2 

= 1 - (a - t/?)e-<°- i/J)/2 

= 1 - ae -or/2 (cos(/?/2) + * sin(j3/2)) + i0e~ a/2 ( cos(/?/2) + i sin(/?/2)) 

= 1 - e~ a/2 ((ac + 0s) + i(as - 0c)) 

where we have set c = cos(/?/2) and s = sin(/?/2). The modulus of a(A) is easily found to satisfy 

|a(A)| 2 = 1 + e“"(a 2 + 0 2 ) — 2e~ a ^ 2 (ac + 0s) 

= l + e~ a (M 2 - 2 e a ' 2 (ac + 0s)) . 

This to the region of stability, symmetric about the positive real axis, defined by 

(a 2 + 0 2 ) < 2 e°' 2 (a cos(/?/2) + /3sin(/3/2)) . (28) 

Figure 1 shows shaded, the part of the complex domain [0,50] x [-25,25] which corresponds to 
values of tt satisfying (28). 

If we concentrate on the shaded region enveloping the positive real axis, we note that for large 
a, the limits of the curve bounding that section of the stability region are 0 = ±it. This is because for 
it <\0\< 2x, the coefficient eos(/?/2) becomes negative, making (28) impossible to satisfy/or large a, ( 0 
being fixed). On the other hand, for fixed 0 such that \0\ < *, we have cos(/?/2) > 0 and there will always 
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25 



Real u 


Figure 1 : Stability region for the mid-point rule (cf. (28)). 


be an a laige enough that will satisfy (28). In addition, all the lines /? = ±4fc7r , k = 0,1,2,.. will belong 
partly to the region: specifically all those points in these lines with a larger than the (only) positive root of 
x(2e x / 2 — x) = /3 2 are acceptable points. As shown in Figure 1, around each of these lines there is a whole 
subregion of stability, whereas in between, there are regions around the lines /? = ±2(2 k + 1 )x which are 
unstable. 

We now go back to studying the general scheme and extend the above analysis to the general case. 
Again we restrict ourselves to the case where A is real positive. This condition will be replaced by a weaker 
condition later. However we do not necessarily assume that the weights are positive. Then we examine the 

conditions under which 

k 

-1 < a(A) = 1 - 6 ^/z,e -( * -T * )A A < 1 

>=l 

or 

0 < *Ae- a J>,e T * A < 2. (29) 

t=i 

Consider now any quadrature rule that satisfies the following two conditions: 

1. Positivity condition: 

k 

]T^,e AT * >0, for A>0. (30) 

t=i 


15 


( 31 ) 


2. Undervaluation condition: 

r s k 

E k = I e Xa ds - > 0» for ^ > O- 

J ° T^\ 

The purpose of the positivity condition is to restrict the quadrature rule so that it yields nonnegative 
approximations to the integral of the function e A * on any positive interval. It is verified whenever the 
quadrature weights are nonnegative. In particular 

Lemma S.l The positivity condition is satisfied for any Gaussian quadrature formula. 

Proof The fact that the weights are positive for Gaussian quadrature is well known; see, e.g. [15, p. 328]. 

□ 

For the undervaluation condition, we can prove the following lemma. 


Lemma 5i 

j. Any composite or simple open Newton-Cotes formula satisfies the undervaluation condition (31). 
2. Any k-point Gaussian quadrature rule satisfies the undervaluation condition (31). 


Proof This is a consequence of the well-known error formulas for open Newton-Cotes rules [15, p. 
313-314], and for Gaussian quadrature rules [15, p. 330], and the fact that all the derivatives of the function 

e x ‘ are positive in the interval [0, £]. D 

Going back to the condition (29), we first observe under the positivity condition (30) the left-hand 
inequality is trivially satisfied. Moreover, under the undervaluation condition we have 



and as a result 


0 < 6Xe- sx p mt riX < \e~ sx = 1 - < 2. 


We have therefore proved the following result 


Theorem 5.1 Consider the time-stepping procedure (25) based on the second approach (Section 4.2) using 
a quadrature formula satisfying the positivity condition (30) and the undervaluation condition (31). Then 
the region of stability of this scheme contains the positive real line. 


Note that schemes with such properties are said to be Ao-stable in the literature [39]. In many of our 
numerical experiments, we have observed this difference in stability behavior between schemes that satisfy 
the conditions of the theorem and those that don’t. In many instances the composite closed-type Newton- 
Cotes formulas tended to diverge for a small number of subintervals. On the other hand we never noticed 
any stability difficulties with the open Newton-Cotes formulas or with the Gauss-Oiebyshev quadrature. 

As a general recommendation, it is advisable to use open Newton-Cotes formulas instead of closed 
formulas. Although these formulas satisfy the undervaluation condition according to the previous lemma, 
it is not known whether all of them satisfy the positivity condition (30). We do know that some of the low 
order, open Newton-Cotes rules (3-point, 4-point, and 6-point) do satisfy this condition since their weights 

are positive. 
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Gaussian rales are extremely attractive not only because of their stability properties but because of 
their potential to drastically reduce the number of function evaluations needed to produce a certain level of 
accuracy. There is still much work to be done to determine which of the quadrature formulas will yield the 

best results. 

As is suggested by the analysis of the mid-point rale for complex A, we expect the full analysis of the 
stability of the second approach to be very complicated for such cases. 


6 Numerical experiments 
6.1 A symmetric model problem 

Our first test problem is issued from the semi-discretization of the heat equation 


U X X + tlyy + u zzy x, y, z € (0,1) 

u = 0 on the boundary 


using 17 grid points in each direction, yielding a matrix of size iV = 15 3 = 3375. The initial conditions are 
chosen after space discretization, in such a way that the solution is known for all t. More precisely. 


«(0 ,Xi,yj,Zk) 


E 




1 

i'+j' + k' 


sin 


ii'r 

n+1 


sin 


jj 1 * 

n+1 


sin 


kk'v 

n+1' 


The above expression is simply an explicit linear combination of the eigenvectors of the discretized 
operator. In order to separate the influence of spatial discretization errors and emphasize the time evolution 
approximation, for the experiments in this section we consider the solution of the semi-discrete problem 
ut = -Att to be the exact solution. 

The purpose of the first test is to illustrate one of the main motivations for this paper, namely the 
effectiveness of using large dimensional Krylov subspaces whenever possible. As shown in [12, 9], similar 
conclusions also hold for methods based on rational approximations to the exponential. 

Assume that we want to integrate the above equation between t=0 and t=0.1, and achieve an error- 
norm at t = 0.1 which is less than e = 10 -10 . Here by error-norm we mean the 2-norm of the absolute 
error. 

We can vary both the degree m and the time-step 6. Normally we would prefer to first choose a 
degree m and then try to determine the maximum 6 allowed to achieve the desirable error level. However, 
for convenience, we proceed in the opposite way: we first select a step-size 6 and then determine the 
piinimum m that is needed to achieve the desirable error level. This experiment was performed on a Cray 
Y-MP. What is shown in Table 1 are the various time steps chosen (column 1) and the minimum needed 
values of m (column 2) to achieve an error norm less than c = 10 -, ° at t=0.1. We show in the third 
column the total number of matrix-by-vector multiplications required to complete the integration. The times 
required to complete the integration on a Cray Y-MP are shown in column 4. We also timed separately the 
evaluations of e~ SHm e\ and found these times to be negligible with respect to the rest of the computation. 
The last column of the table shows the type of rational approximation used when evaluating e~ SHm e x , with 
C(v, v) representing the diagonal (v, v) approximation and P(v, v) representing the diagonal (i/, v) Pad6 
approximation. 

Another point is that the matrix is symmetric, so we have used a Lanczos algorithm to generate the 
instead of the full Amoldi algorithm. No reorthogonalization of any sort was performed. The matrix 
consists of 7 diagonals, so the matrix by vector products are performed by diagonals resulting in a very 
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6 

m 

M-vec’s 

Time (sec) 

||^rror|| 2 

Method 

0.5000E-04 

6 

12006 

0.8173E+01 

0.1957E-11 

P(2,2) 

0.1000E-03 

7 

7007 

0.4793E+01 

0.3308E-10 

P(2^) 

0.5000E-03 

10 

2010 

0.1342E+01 

0.1800E-10 

P(4,4) 

0.1000E-02 

12 

1200 

0.7983E+00 

0.2260E-10 

P(4.4) 

0.5000E-02 

20 

400 

0.2672E+00 

0.527 IE- 10 

P(8,8) 

0.1000E-01 

26 

260 

0.1740E+00 

0.7247E-10 

P(8,8) 

0.2000E-01 

34 

170 

0.1080E+00 

0.3236E-10 

C(14,14) 

0.3000E-01 

39 

156 

0.9876E-01 

0.6362E-10 

C(14,14) 

0.4000E-01 

44 

132 

0.8030E-01 

0.4122E-10 

C(14,14) 

0.5000E-01 

49 

98 

0.5932E-01 

0.579 IE- 10 

C(14,14) 

0.1000E+00 

71 

71 

0.4186E-01 

0.9993E-10 

C(14,14) 


Table 1 : Performance of the polynomial scheme with varying accuracy on the Cray Y-MP. 


effective use of the vector capabilities of the Cray architecture. Based on the time for the last entry of the 
table, we have estimated that the average Mflops rate reached, excluding the calculation of e~ 8Hrn e\ , was 
around 220. This is achieved with little code optimization. 

Observe that the total number of matrix by vector products decreases rapidly as m increases. The 
ratio between the lowest degree m — 6 and the highest degree m = 71 is 169. The corresponding ratio 
between the two times is roughly 200. The case m = 71 can achieve the desired accuracy in just one step, 
that is, with 6 = 0.1. On the other hand for m = 6 a time-step of 6 = 5 x 10 -5 must be taken resulting 
in a total of 2000 steps. We should point out that we are restricting ourselves to a constant time-step, but 
more efficient variable time stepping procedures are likely to reduce the total number of steps needed. From 
the result of Theorem 2.1 and Theorem 2.2 these observations come with no surprise. In effect, increasing 
the dimension of the Krylov subspace will increase the accuracy in such a way that a much larger p (i.e., a 
larger £) can quickly be afforded. 


6 2 A nonsymmetric problem with time-varying forcing term 
In this section we consider the more difficult problem 


= Au(i, T oy’ l l + ,(*, V, 0 . ( 33 ) 

where A stands for the three-dimensional Laplacian operator with homogeneous boundary conditions and 
initial conditions: 


u(x, y, 2 , 0 ) = x(x - 1 )y(y - 1 ) 2(2 - 1). 

The function r is defined in such a way that the exact solution of the above partial differential equation 
is given by 


u(x,y,z,t) = 


g(g- i)y(y- !)*(*- *) 
l + t 


(34) 


This yields 

r(x,y, 2 ,t) 


x(x - l)y(y- 1 ) 2 ( 2 - 1) (2x — 1 )y(y- 1 ) 2 ( 2 - 1) 

(1-M) 2 7 1 + t 

2[y(y - i)*Q* - 1) + s(s - - 1) + *(g - i)y(y - ^3 

1 + t 
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6 

m 

Npts 

Mvec’s 

Time (sec) 

||^rror || 2 

0.2000E+00 

40 

60 

205 

0.2402E+01 

0.6151E-05 

0.1000E+00 

40 

40 

410 

0.3690E+01 

0.7483E-06 

0.1000E+00 

40 

30 

410 

0.3114E+01 

0.301 IE-05 

0.1000E+00 

35 

40 

360 

0.3177E+01 

0.7483E-06 

0.1000E+00 

30 

40 

310 

0.2617E+01 

0.7484E-06 

0.1000E+00 

25 

40 

260 

0.2110E+01 

0.8743E-06 

0.1000E+00 

25 

30 

260 

0.1721E+01 

0.1054E-04 

0.5000E-01 

25 

30 

520 

0.3413E+01 

0.7503E-07 

0.5000E-01 

25 

20 

520 

0.2726E+01 

0.396 IE-06 

0.5000E-01 

20 

20 

420 

0.2124E+01 

0.6163E-05 

0.5000E-01 

15 

20 

320 

0.1550E+01 

0.5463E-04 

0.2500E-01 

20 

10 

840 

0.2992E+01 

0.9015E-06 

0.2500E-01 

15 

20 

640 

0.3086E+01 

0.5327E-05 

0.2500E-01 

15 

10 

640 

0.2109E+01 

0.9887E-05 

0.1000E-01 

10 

10 

1100 

0.3561E+01 

0.1743E-05 

0.1000E-01 

7 

10 

800 

0.2693E+01 

0.9483E-05 


Table 2: Performance of the polynomial scheme with varying accuracy on the Cray-2. 


As in the previous example, we took the same number of grid points in each direction, i.e., n x — 
n y = n z = 17, yielding again a matrix of dimension N = 15 3 = 3375. This experiment was conducted 
on a Cray-2. Table 2 is the analogue of Table 1, except that we only report some representative runs with 
various values of m and 6. The parameter 7 is set equal to 10.0. The integration is carried out from t = 0.0 
to t — 1.0. The second approach was used in which the integrals were calculated with 1 1 -points composite 
(closed) Newton-Cotes formulas. In most cases we had to take more than 1 1 points, in which case we simply 
used a composite rule with a total number of points equal to 1 + k x 10. The third column reports the total 
number of subintervals Npts used to advance by one time step of 6. Thus, Npts is a multiple of 10. The 
time shown in the fifth column is the time in seconds to advance the solution from t = 0.0 to t = 1 . 0 , on 
a Cray-2. The sixth column shows the 2 -nonn norm of the error with respect to the exact solution of the 
continuous system, that is, with respect to (34). 

We observe that for larger tim e steps a larger number of quadrature points must be used to keep a 
good level of accuracy. We show the results associated with the smallest number of points for which there 
are no significant qualitative improvements in the error when we increase Npts, while keeping m and 6 
constant. Our tests indicate that the higher the order of the quadrature used, the better. This means that large 
gains in speed are still likely if we use more optimal, Gaussian quadrature formulas. A noticeable difference 
with the previous simple example is that while large values of m tend to reduce the total number of matrix 
by vector multiplications required, the reduction is not as substantial. 

6 3 A comparison with other methods 

Although an exhaustive comparison with other schemes is beyond the scope of this paper, we would like 
to give an idea on how the efficiency of the Krylov subspace propagation compares with some immediate 
contenders. The first of these contenders is simply the forward Euler scheme. This is an explicit scheme, and 
for not-too-small space mesh sizes, should not be excluded given that the corresponding process is highly 
vectorizable. However, an approach that may be far more challenging is to use an implicit scheme such as 
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the Crank-Nicolson method: 


(/ + ^ A)w n+ \ = (I - ^ A)w n + Sr(t n + S/2); (35) 

combined with an iterative method, for example the conjugate gradient method, for solving the linear 
systems. The main attraction here is that we can solve the linear systems inaccurately, making the solution 
process very inexpensive. From this viewpoint, this "'inexact Crank-Nicolson method shares many of the 
benefits of the Krylov method, as was already mentioned in the introduction. Finally, a well known stiff 
ODE package such as LSODE [14] is also considered. 

For this comparison we took the same problem as before, but we needed to take 7 = 0.0 in order to 
make the matrix A symmetric. This was necessary in order to be able to utilize the usual conjugate gradient 
algorithm for the linear systems in the Crank-Nicolson scheme. The r function is defined as before, and the 
number of grid points in each direction is again n x = n y - n, = 17, yielding N = 15 3 = 3375. 

We should point out that for the Crank-Nicolson method, we do not use preconditioning, and this is 
by no means a drawback. Because of time stepping, the matrix is usually very well conditioned, and as a 
result, the algorithm converges in a rather small number of steps. Moreover, because there is no need to 
solve the systems with high accuracy, the overhead in setting up the preconditioner would be difficult to 
amortize. Finally, the good preconditioners such as the incomplete factorizations do not generally yield a 
high a performance on vector machines. In our tests, the CG algorithm is stopped as soon as the residual 
norm is reduced by a factor which does not exceed a tolerance t. Wc always take the tolerance c that yields 
the smallest (or close to the smallest) time for the Crank-Nicolson scheme to complete. In this test we used 
the Chebyshev rational approximation of order ( 6 , 6 ) throughout, for the computation of e~ SHm e \ . A final 
point of de tail is that symmetry has been taken advantage of, both in Crank-Nicolson, which is able to use 
the usual conjugate gradient method, and in the Krylov method, in which we replaced the Amoldi algorithm 
with the Lanczos version. 

For LSODE we used the method flag MF=24, which means that a stiff method is used, and the 
Jacobian is user-supplied in banded format. The Cray-optimized Linpack banded solver is called to solve 
the linear systems. Table 3 shows the results. For LSODE we used a relative tolerance of rtol = 10 -14 
and an absolute tolerance of atol = 0 . 0 . 

This comparison reveals that the Krylov scheme is superior when one considers the number of 
matrix-by-vector products as the primary criterion. There are situations in which these may dominate the 
cost, in which case the execution time could be proportional to the number of matnx-vector products. When 
execution time is the primary criterion for comparison, then the Krylov scheme is still faster than Crank 
Nicolson but not by as large a margin. The Forward Euler scheme was unstable for the time step dt — 0.001 
midt = 0.00075. We also performed a set of tests with a larger version of problem corresponding to the grid 
sizes n x = n y = n z = 22, leading to a problem of size N = 8000. The conclusion is essentially the same 
in that Crank-Nicolson and the Krylov method are comparable, but the time for the explicit Euler scheme 
becomes much higher. We should add that we have regarded the problem purely from the angle of systems 
of ODEs, although we are aware that in practice a balanced accuracy between space and discretization is 
generally sought However, this would lead to comparisons that are too complex. 

6.4 A case with highly oscillating forcing term 

We consider here an example of the same form as in the previous subsection; that is, the general equation is 
of the form (33), and the initial and boundary conditions are identical. However, we now consider a forcing 
term for which the exact solution is given by 

u(x,y,z,t) = x(x - l)y(y - 1 )z(z - l)cos(airf). (36) 


20 


Method 

Method 1 

Matrix-vec. 

Total Cray-2 

Final 

used 

parameters 

products 

time (sec.) 

error 

Krylov 

m = 30, npts = 40 

155 

0.9374E+00 

0.6670E-05 

6 = 0.2 

m = 40, npts = 40 

205 

0.1225E+01 

0.6652E-05 


m = 35, npts - 40 

180 

0.1038E+01 

0.6672E-05 


m = 30, npts = 40 

155 

0.9355E+00 

0.6670E-05 


m = 20, npts = 40 

105 

0.6615E+00 

0.7103E-05 


m = 20, npts = 30 

105 

0.5229E+00 

0.1764E-04 

Krylov 

m = 25, npts = 30 

182 

0.9530E+00 

0.9367E-06 

6 = 0.15 

m = 20, npts = 30 

147 

0.7828E+00 

0.7185E-05 


m = 15, npts = 30 

112 

0.615 1E+00 

0.4244E-04 

Krylov 


210 

0.1044E+01 

0.7956E-06 

6 = 0.1 



0.9086E+00 

0.8574E-05 

Crank- 

dt = .01 , e = .001 

1053 

0.1192E+01 

0.1267E-05 

Nicolson 

dt = .005, f= .001 

1578 

0.1767E+01 

0.3329E-06 

F-Eulcr 


2000 

0.2779E+01 

0.8678E-06 

LSODE 

MF=24 

1077 

0.3766E+02 

0.4222E-04 


Table 3: Performance comparison of a few methods on Problem of Section 6.3. 


In other words, r is defined by 

r(x,y,z,t ) = -ax(x — l)y(y- 1)z(z — l)sin(axt) 

-2[y(y - l)z(z - 1) + x(* - 1 )z(z - 1) + x(x - 1 )y{y - 1) 

+7(2 x - 1 )y{y - l)z(z - l)]cos(ajrt). 

If the coefficient a is chosen to be large, then the problem can be difficult to solve. We took here 7 = 0.0 
and a = 20. The discretization mesh is the same as in the previous example. 

We compared the same four methods as those of the previous section, the Forward Euler scheme, the 
Crank-Nicolson/CG scheme, LSODE, and the Krylov method using the second approach. In this example 
LSODE failed to converge in a reasonable amount if time. 

One difference with the previous tests is that here we varied the quadrature formulas used. Thus 
npts = 4x8 indicates that we used a composite rule in which the interval of integration is first divided by 
4 and then on each subinterval a nine-point formula is used. Apart from this, all of the details concerning 
implementation are identical with those of Section 6.3, except that this time we used the Chebyshev rational 
approximation of order (8,8) instead of (6,6) to compute the vectors e~ SHm e \ . 

^ The results in Table 4 indicate that for this harder problem, the Krylov scheme performs far better 
than its competitors. The Crank-Nicolson scheme now requires smaller time steps to achieve acceptable 
accuracies. The forward Euler scheme would require a much smaller time step that those of the other 
methods to achieve comparable performance. 


•11- ^ ^ ^ + __ 7 . ? 

7 Summary and Conclusion 

:*J- 2 *i;**i ri '■ . *J-. HSh-wd HH: •==' f .'=== - : : ... r = > ‘ 

The goal of this paper was to show how to systematically develop explicit type schemes, or to use our 
terminology, polynomial schemes for solving parabolic partial differential equations by the method of lines. 
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Method 

Method 

Matrix-vec. 

Total Cray-2 

Final 

used 

parameters 

products 

time (sec.) 

error 

Krylov 

m = 40, npts = 3x10 

205 

0.1202E+01 

0.805 IE-04 

6 = 0.2 

m = 30, npts = 2x10 

155 

0.6902E+00 

0.2262E-03 


m = 30, npts = 8x5 

155 

0.1 196E+01 

0.2862E-04 


m = 25, npts = 20 x 2 

130 

0.1096E+01 

0.3188E-04 


m = 25, npts = 8x5 

130 

0.1063E+01 

0.2320E-04 

Krylov 

m = 20, npts = 2 x 10 

210 

0.1083E+01 

0.7585E-05 

6 = 0.1 

m = 15, npts = 10 x 2 

160 

0.8995E+00 

0.9713E-03 


m = 15, npts = 2 x 10 

160 

0.8739E+00 

0.698 8E-04 


m — 15, npts = 3x8 

160 

0.1043E+01 

0.1592E-04 


m = 15, npts = 4x8 

160 

0.1298E+01 

0.1757E-05 


m = 10, npts = 4x8 

110 

0.1066E+01 

0.3504E-04 

Crank- 


4322 

rag ftgl 

0.8816E-04 

Nicolson 


8000 


0.2203E-04 

F-Euler 

dt = .5E-03 

2000 

0.4780E+01 

0.235 8E-02 


dt = .IE-03 


0.2364E+02 

0.4712E-03 


dt = .5E-05 

3 

0.4861E+02 

0.2356E-03 

LSODE 

MF=24 

— 

— 

— 


Table 4: Performance comparison of a few methods for Problem of Section 6.4. 


We have proposed one such procedure that has the advantage of being very simple. The method proposed 
requires no information about the spectrum of the space discretization operator. We have recommended 
using high dimension Krylov subspaces whenever possible. By using a Krylov subspace of high dimension 
to approximate the evolution operator, we are able to use larger time-steps. At each step there is an additional 
cost due to the increased dimension of the Krylov subspace which translates into an increase in the number 
of matrix by vector multiplications. On the other hand, because of the larger time-step, the total number 
of steps required is reduced to such an extent that there is an appreciable net gain in performance. We 
have also proposed two approaches for handling non-constant forcing terms, with the view of extending 
these methods for general ODEs and nonlinear partial differential equations. The stability analysis of these 
approaches shows that the first is unconditionally stable and the second is Aq stable for a large class of 
integration schemes used. This has been widely confirmed by numerical experiments which indicate that 
the schemes proposed are competitive with standard methods such as Crank-Nicolson. 

Improvements to the approach described in Section 4.2 are possible by developing quadrature formulas 
that are more elaborate and specialized than the simple Newton-Cotes formulas used in our numerical 
experiments. We believe that the method proposed here can be extended to the solution of general time 
dependent nonlinear partial differential equations: the only subtlety is to isolate the action of the evolution 
operator, which is then well approximated by the schemes proposed here. 


A Appendix: Partial fraction coefficients 

In Table 5 we list some of the coefficients of the partial fraction expansion for the Chebyshev rational 
approximation to the exponential. These are the (fc, k) approximations for k = 10 and A: = 14. Note that 
because the roots go in complex conjugate pairs, we only need to show those with nonnegative imaginary 
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Degree 


Coef/Root 


10 


ao 

ai 

a 2 

«3 

04 


05 

Ai 

A 2 

a 3 

A4 

A 

ao 


14 


ai 

0 2 

03 

04 

05 

06 
07 


Ai 

A 2 

a 3 

A4 

As 

A6 

A7 


Real Part 

0.1361 1 2052334544905E-09 
0.963676398 1 67865499E+01 
-0. 1 4234330208 1 7947 1 8E+02 
0.5131 16990967461 106E+01 
-0.545173960592769901E+00 
0.1 15698077160221 179E-01 
-0.40277324675 1 880265E+01 
-0.32837528832316991 1E+01 
-0.171540601576881357E+01 
0.894404701 60948 1 378E+00 
0.516119127202031791E+01 
0. 1 83216998528 140087E-1 1 
0.557503973 136501 826E+02 
-0.938666838877006739E+02 
0.469965415550370835E+02 
-0.961424200626061065E+01 
0.75272206397832 1 642E+00 
-0.188781253158648576E-01 
0.14308643141 1801849E-03 
-0.562314417475317895E+01 
-0.5089346797282161 10E+01 
-0.399337 1 36365302569E+01 
-0.226978543095856366E+01 
0.208756929753827868E+00 
0.370327340957595652E+01 
0.889777151877331 107E+01 


Imaginary Part 


-0.4210919447678 15675E+02 
0. 1 76390663 1 57379776E+02 
-0.243277 14 1 223876469E+0 1 
0.284234540632477550E-01 
0.137170141788336280E-02 
0.1 19385606645509767E+01 
0.359438677235566217E+01 
0.603893492548519361E+01 
0.85827568986 1 307000E+01 
0. 1 137515625 19 165076E+02 


-0.204295038779771 857E+03 
0.912874896775456363E+O2 
-0.1 16167609985818103E+02 
-0.2641956 1 3880262669E+0 1 
0.670367365566377770E+00 
-0.3436961764458024 14E-01 
0.287221 1332288 14096E-03 
0.1 1940692161 1247440E-K)! 
0.35888243922837688 1E+01 
0.600483209099 604664E+0 1 
0.84617388 1758693369E-K11 
0.109912615662209418E+02 
0. 1 3656373 192499 1 8 84E+02 
0.166309842834712071E+02 


Table 5: Coefficients of the partial fraction expansion for degrees 10 and 14 


parts. In fact there are exactly [Jb/2] such roots for the (k, k ) approximation. Moreover in the case of a 
complex pair the corresponding coefficient o, in the partial fraction expansion is doubled. The roots are 
also distinct, and we can thus write for a real x: 


oq + Re 


L.=i 


x — A,- 


(37) 


B Appendix: Proof of Theorem 2.1 

The following lemma provides the basis for establishing error bounds for the error of the approximation (7). 

Lemma B.l Let A be any matrix, and p be any polynomial of degree smaller than m, approximating e 
with the remainder r m (z) = e~ x - p(z). Then, 

\\e~ A v - dV m e~ Hm ei || 2 < /3(||r m (A)|| 2 + ||r m (ff m )|| 2 ), (38) 
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where (3 = IMh- 


Proof As a result of the relation e~ z = p(z ) + r m (z), we have 

e~ A v = (3\p{A)v i + r m (A)vi]. (39) 

Using induction and the relation (6) we can show that A j vi = V m H ] m e\,ioT j < m - 1 and as a consequence 
we have 

p(A)v i = V m p{H m )e\. (40) 

As a result of the definition of p and r mt we can write 

p(H m )e i = e~ Hm e\ - r m (H m )ei. (41) 


To complete the proof, we substitute (41) in (40) and the resulting equation in (39) to get, 

e~ A v = fiV m e~ Hm e\ 

+ (3\r m (A)v i — VJnr m (ff m )ei]. 


The result follows immediately. 

Thus, the error can be estimated by bounding each of the two remainder terms. We now use the 
concept of the logarithmic norm of a matrix as defined in Section 2.2. We will specifically use the inequality 
||e B ‘|| < e^ B >‘. 

We next prove the following lemmas: 


Lemma B.2 Let 


, , y? (-*)* 


*=0 


be the (m—l)-thpartial Taylor sum ofe~ z and let r m (z)bethe associated remainder r m (z) = e z -s m _i(z). 
Define 

.iUri 

~<r\ h k ') 


4>(v) 


Then, 


||r m (A)|| < P w ||#i7) < ||A m || 


max(l,e r ’) 


ml 


(42) 


where tj = A). 


Proof The remainder after m terms of the Taylor series expansion in integral form applied to exp( - A) 
is given by 

and therefore, 

IM4)|| < (Irrf)! jl 
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Denoting rj = p(- A) for convenience, since 0 < r < 1, we have from Eq. ( 8 ): 

|| e ->t(l-T)|j < c m(-A)( 1-t) = e *?(l-r) 

from which we get 

IM4)II £ (44) 

The value of the integral in the above expression is determined by noting that the remainder of the ( m - 1 )-st 
Taylor expansion of e v satisfies 


v_ynl = a-— C e"V-^T m -'dT 
^ fc! (m- 1 )! Jo 


which gives 


<f>(v) = 



e „(l-T) r m-I dr 


Incidentally, this expression shows that 4>{t]) is nonnegative. Substituting this in (44) proves the first 
inequality in (42). 

To prove the second part of the inequality, we observe that 


*») = 7=Ti5i £ £ -XI * - !5? ^r !1 - 


(45) 


We would like to mention that the upper bound for used in the above lemma can be somewhat 
refined. More specifically, it can be shown that: 


<Kv) < \ 


i 

g 

mt 


e” 


(m— l)i}' 


T=T 


if 17 < 0 

ifO< t?< m ~V(m - 2)!m 
if — 2 )!m < rj 


Finally, we will need the following lemma: 

Lemma BJ If A is any real matrix and H m is the associated mxm upper Hessenberg matrix generated 
by m steps of the Amoldi algorithm, then: 

fi(-ffm) < P{~A). 


Proof By construction V m consists of m orthonormal vectors and H m satisfies H m = V^AV m . Since 
the maximum eigenvalues of the symmetric parts of A and H m can be characterized as the maximum values 
taken by their Rayleigh quotients, it easily follows that 


p(-H m ) = max A,(- 


VlAV m + VlA T V w 


) < max A<(— 


A + A 1 


■) = K-A). 
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Proof of Theorem 2.1. 

First note that as in Lemma B.2 we can show that 

IM#m)||2 < 

where p m = llffmlh = \\V£AV m \\ 2 . The right-hand side of the above inequality is an increasing function 
of p m and p m < P- From Lemma B.3, p(—H m ) < p(—A) = rj, and thus: 

IM# m )||2 < P m <Kv)- ( 46 ) 


Using Lemma B.l the proof follows. 
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